Flagging Inland Data - Explore Rolling SD

Summary

CMAR has collected data on several inland bodies of freshwater in Nova Scotia, including lakes and rivers.

CMAR intends to process and publish all inland data under a new “Inland” branch of the Coastal Monitoring Program. Data will be processed in a similar manner to the coastal water quality data, and data flags will be applied using the qaqcmar package.

It is suspected that sensors on some rivers were out of the water for some period of time during the deployment due to low water levels. Data flagging efforts will flag data for periods of time sensors were suspected to be exposed. During the periods in which sensors were exposed to air, recorded temperatures fluctuate more quickly than when sensors are submerged.

The purpose of this document is to help CMAR determine appropriate data flagging tests and thresholds for freshwater (inland) data. We do not currently have enough freshwater data to conduct as thorough an analysis as was done on the coastal water quality data to develop tests and thresholds, so thresholds may be picked in more subjective ways. Note, this initial threshold analysis has been completed on a subset of data.

Waterbodies included in threshold analysis:
[1] "Gold River"         "LaHave River"       "Musquodoboit River"
[4] "Roseway River"      "Round Hill River"   "Salmon River"      
[7] "Tusket River"       "Liscomb River"      "Mersey River"      
Stations included in threshold analysis:
 [1] "Gold River 2"         "LaHave River 1"       "LaHave River 3"      
 [4] "Musquodoboit River 1" "Musquodoboit River 2" "Musquodoboit River 3"
 [7] "Roseway River 1"      "Roseway River 2"      "Round Hill River 1"  
[10] "Round Hill River 2"   "Round Hill River 3"   "Salmon River 1"      
[13] "Salmon River 2"       "Tusket River 1"       "Tusket River 2"      
[16] "LaHave River 2"       "Liscomb River 1"      "Liscomb River 2"     
[19] "Mersey River 2"       "Tusket River 3"      
Stations which may have experienced air exposure:
  • Liscomb 1
  • Liscomb 2
  • LaHave 2
  • Mersey 2
  • Tusket 3
  • Possibly Musquodoboit 1 and 2

Data visualization

Station locations

Approximate location of stations included in threshold analysis.

Plot uncleaned station data

Plot cleaned station data

Suspected outliers have been removed from the following datasets:

  • Liscomb 1
  • Liscomb 2
  • LaHave 2
  • Mersey 2
  • Tusket 3

The cleaned datasets will be used to generate the grossrange thresholds.

Statistical overview

Distribution of sd_roll

Distribution of temperature observations by station (binwidth = 0.25 degree c).

Distribution all

Distribution of all temperature observations (binwidth = 0.25 degree c).

Calculate rolling standard deviation thresholds

Compare various methods for calculating thresholds.

Visualize flagged data

Visualize data flagged using various methods, to determine which method produces the best results. This time the thresholds have been applied to all of the inland datasets, not just the cleaned ones used to generate the thresholds.

Mean_sd

Quartile

Quartile pooled 0.95

Quartile pooled 0.97

Quartile pooled 0.99

Quartile pooled 0.997

Apply rolling SD threshold

Visualize flagged data - all datasets

Due to the right-skew of the sd_roll distribution plots, the quantile method was used to establish thresholds. Because the overall distribution of the data was relatively similar for each station, data has been pooled to determine one rolling standard deviation threshold to be used to flag all inland datasets.

The final threshold value chosen was q99.7: 2.04

Rolling SD flag summary table